On the Entropy of Written Spanish
نویسنده
چکیده
This paper reports on results on the entropy of the Spanish language. They are based on an analysis of natural language for n-word symbols (n = 1 to 18), trigrams, digrams, and characters. The results obtained in this work are based on the analysis of twelve different literary works in Spanish, as well as a 279917 word news file provided by the Spanish press agency EFE. Entropy values are calculated by a direct method using computer processing and the probability law of large numbers. Three samples of artificial Spanish language produced by a first-order model software source are also analyzed and compared with natural Spanish language.
منابع مشابه
Complexity measurement of natural and artificial languages
We compared entropy for texts written in natural languages (English, Spanish) and artificial languages (computer software) based on a simple expression for the entropy as a function of message length and specific word diversity. Code text written in artificial languages showed higher entropy than text of similar length expressed in natural languages. Spanish texts exhibit more symbolic diversit...
متن کاملQuantifying Structure Differences in Literature Using Symbolic Diversity and Entropy Criteria
We measured entropy and symbolic diversity of texts written in English and Spanish. We included texts by Literature Nobel laureates and other famous authors. We formed four groups of texts according to the combinations of language used and the author's Literature Nobel Prize condition. Entropy, symbol diversity and symbol frequency profiles were compared for these four groups. We also built a s...
متن کاملPhonological Awareness Impact on Articulatory Accuracy of the Spanish Liquid [r] in Japanese FL Learners of Spanish
Foreign language learners tend to avoid phonological difficulties and simply transfer sounds whether from their L1 or any pre-existing L2. Phonological awareness (PA) gives students an active role in understanding their own potential in improving pronunciation through several methods. However, such methods are likely to be restricted to only passive learning methods, such as repetition, reading...
متن کاملPhonological Awareness Impact on Articulatory Accuracy of the Spanish Liquid [r] in Japanese FL Learners of Spanish
Foreign language learners tend to avoid phonological difficulties and simply transfer sounds whether from their L1 or any pre-existing L2. Phonological awareness (PA) gives students an active role in understanding their own potential in improving pronunciation through several methods. However, such methods are likely to be restricted to only passive learning methods, such as repetition, reading...
متن کاملAutomatic Recovery of Punctuation Marks and Capitalization Information for Iberian Languages
This paper shows experimental results concerning automatic enrichment of the speech recognition output with punctuation marks and capitalization information. The two tasks are treated as two classification problems, using a maximum entropy modeling approach. The approach is language independent as reinforced by experiments performed on Portuguese and Spanish Broadcast News corpora. The discrimi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/0901.4784 شماره
صفحات -
تاریخ انتشار 2009